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ABSTRACT 

Summary: The Smith-Waterman (SW) algorithm, which produces 
the optimal pairwise alignment between two sequences, is frequently 
used as a key component of fast heuristic read mapping tools and 
variation detecting tools. To allow the fast Single-lnstruction-Multiple- 
Data (SIMD) SW been widely used in third-party software, we wrote a 
C/C++ library, which extends Farrar's Striped SW (SSW) to return the 
full information of the optimal alignment, and necessary information 
of a next-optimal alignment while keeping the same speed as the 
original algorithm that only returns the optimal score. 
Availability: The SSW library and its corresponding alignment 
tool are available at: https://github.com/mengyao/Complete-Striped- 
Smith-Waterman-Library 
Contact: zhangmp@bc.edu 
Supplementary Information: 



1 INTRODUCTION 

The time complexity and memory requirement of the original 
SW algorithm ( |Smith and Waterman |1981| > make it impractical 
for mapping sequencing reads to large (e.g. mammalian) genome 
reference sequences. Several improvements have been made 
that result in substantial acceleration, using e.g. SIMD (Alpern 
\et al. 



1995b and Graphics Processing Unit (GPU) instructions 



i Liu et al.\ |2009| |20l0l |Ligowski et al.\ |20TT) , with Farrar's 
SSW implementation ( |Farrar| |2007| > achieving the fastest internal 
parallelization on the widely available SIMD processors. 

With the exception of the SSEARCH program ( |Pearson| |1991[ l, 
however, existing accelerated SW implementations only return the 
optimal SW alignment score, rendering them less useful for read 
mapping where the objective is to obtain the map location and 
detailed alignment of a short sequence fragment to a typically 
much longer reference sequence. It is often also useful to know 
whether or not the alignment is "unique" i.e. whether there is a 
single best read mapping position. This question can be answered 
by obtaining the best alternative alignment to see whether its score 
is the same as (in which case the read mapping is not unique) 
or lower than (in which case the read mapping is unique) that of 
the optimal alignment. Although the latest version of SSEARCH 
( |Pearson| 1 199 1 [ > can return detailed alignment corresponding to 
the optimal score, it is much slower than even unaccelerated 
SW implementations for short reference sequences (Supplemental 
Figure SI). Furthermore, because the software is not written in 
a library format it is cumbersome to integrate into third-party 

* To whom correspondence should be addressed. 



software. The aim of the present work is to provide a fast SW 
implementation with all essential output necessary for use in read 
mapping tools that is readily integrated with other C/C++ software. 



2 ALGORITHM AND IMPLEMENTATION 

We adopted the improvement of the "lazy F loop" proposed by SWPS3 
jSzalkowski et al\ [20081 to speed up Farrar's method. Besides this, we 
made the new following additions: (1) We record the optimal alignment 
ending positions during the SIMD SW calculation, and generate the detailed 
alignment by a reversed SIMD SW and a banded SW. When the score matrix 
is filled, we store the maximal score of each column in a "max" array, and 
record the complete column that has the maximal score. Next, we locate 
the optimal alignment ending position on the reference and the query by 
seeking the maximal score in the array and the recorded column, resp.. The 
reversed SIMD SW locates the best alignment beginning position from the 
ending position by calculating a smaller scoring matrix. Then, the banded 
SW (whose band is defined by the beginning and ending positions) generates 
the detailed alignment. (2) We determine a next optimal alignment score by 
seeking the second largest score in the "max" array. To avoid a sub-alignment 
of the primary alignment returned, we mask the elements in the region of the 
primary alignment of the "max" array and locate the second largest score 
from the unmasked elements (Figure[TJ. 



3 RESULTS 

3.1 Usage 

The SSW library is an application program interface (API) that can 
be used as a component of C/C++ software to perform optimal 
protein or genome sequence alignment. The library returns the SW 
score, alignment location and traceback of the optimal alignment, 
and the alignment score and location of the suboptimal alignment. 
We bundled the library as an executable alignment tool that can be 
used directly to perform protein or DNA alignments. Notably, this 
tool can be used to align sequencing reads to very large reference 
genome sequences e.g. the human genome. 

3.2 Performance 

We compared SSW's performance (with and without returning the 
detailed alignment, SSW-C and SSW, resp.) to Farrar's accelerated 
SW and SSEARCH (version 36.3.5c) on a Linux machine with 800 
MHz x86_64 AMD processors. We ran each program on a single 
thread. 

To measure the speed of protein database searching, we aligned 
protein sequences of varying length (60-2,432 aa) against the 
UniProt Knowledgebase release 2012_01 (including Swiss-Prot 
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Fig. 1. Illustration of alignment traceback and suboptimal alignment score 
determination. An example SW score matrix is shown (penalties for match, 
mismatch, gap open and extension are 2,-1, -2, and -1 resp.). The bottom 
row indicates the maximum score for each column. The algorithm locates 
the optimal alignment ending position (the black cell with score 9) using 
the array of maximum scores, and then traces back to the alignment start 
position (the black cell with score 2) by searching a much smaller, locally 
computed score matrix (circled by the black rectangle). Finally, a banded 
SW calculates the detailed alignment by searching the shaded sub-region. 
The scores connected by solid arrows belong to the optimal alignment. 
The max array records the largest score of each column. After the optimal 
alignment score (marked by 'best') is found, its neighborhood is masked, and 
the second largest score is reported outside the masked region (marked by 
2nd best). The scores connected by dashed-line arrows trace the suboptimal 
alignment. 



and TrEMBL, a total of 6,751,887,709 aa residues in 20,662,136 
sequences), for all four algorithms (see Figure |2f A)). SSEARCH 
did not return alignment results against the entire UniProt database, 
and we were only able to test it against one half of the TrEMBL 
sequences. Our SSW algorithm was the fastest or equally fast 
to SSEARCH across the entire protein sequence length range we 
tested. 

To benchmark genome sequence alignment, we tested the 
programs with both simulated data and real sequencing reads. We 
selected 1Kb - 10Mb regions from human chromosome 8, and using 
an Illumina read simulator (http://www.seqan.de/projects/mason/) 
we generated a thousand 100 bp-long sequences from these regions. 
We then aligned these reads back to their corresponding reference 
sequences with each of the four algorithms, and compared running 
time (see Figure |2jB)). Finally, we compared running times 
using four sets of a thousand real sequencing reads representing 
three different sequencing technologies, aligned to four different 
reference genomes: (1) ABI capillary reads (1,388 bp average 
length) against the severe acute respiratory syndrome (SARS) 
virus (29,751 bp); (2) Ion Torrent reads (236 bp) against E. coli 
(4.94 x 10 3 bp); (3) Illumina reads (100 bp) against T. gondii 
(6.08 x 10 7 bp); and (4) Illumina reads against human chromosome 
1 (2.49 x 10 8 bp) as shown in Figure^C). The genome alignment 
performance of these programs under another set of SW parameters 
is shown in Supplemental Figure SI. These results indicate that even 
while returning a full optimal alignment and one suboptimal score, 
our SSW algorithm is just as fast as Farrar's accelerated version. 



Fig. 2. Running time for different SW implementations. Log-scaled running 
time is shown on the y-axis for SSW without (red) and with (blue) detailed 
alignment, Farrar's implementation (green) and SSEARCH (pink). (A) 
Running times are shown for searching protein sequences against the full 
UniProt database (left) and one half of the TrEMBL database (right). All 
SW implementations used the BLOSUM50 scoring matrix with gap open 
penalty -12 and extension penalty -2. (B) The running time of aligning 1,000 
simulated Illumina reads to human reference sequences of various lengths; 
and (C) of aligning 1,000 real sequencing reads to various microorganism 
genomes and the human chromosome 1 are shown. Farrar's implementation 
cannot handle long sequences as human chromosome 1 , so its corresponding 
running time is not shown here. 



4 CONCLUSION 

We developed and made available a fast SW library using an SIMD 
acceleration. By returning not only optimal alignment score but also 
the actual alignment, as well as a secondary optimal or suboptimal 
alignment score, the SSW library is easily adopted by other, 
often heuristic genomic sequence analysis programs that require 
global or local SW alignment. SSW has already been adopted 
in two programs developed by our group: the split-read mapping 
program SCISSORS (https://github.com/wanpinglee/scissors) and 
the mobile element insertion (MEI) detector program TANGRAM 
(https : // gi thub. com/j iantao/Tangram) . 
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